The three performance hits are enumerated in the previous question:
* By itself, an extra layer of indirection is small potatoes.
* Freestore allocations can be a big problem (standard malloc's performance
degrades with more small freestore allocations; OO s/w can easily become
'freestore bound' unless you're careful).
* Extra dynamic binding comes from having a ptr rather than an object.
Whenever the C++ compiler can know an object's *exact* class, virtual fn
calls can be *statically* bound, which allows inlining. Inlining allows
zillions (would you believe half a dozen :-) optimization opportunities
such as procedural integration, register lifetime issues, etc. The C++
compiler can know an object's exact class in three circumstances: local
variables, global/static variables, and fully-contained subobjects.
Thus fully-contained subobjects allow significant optimizations that wouldn't be possible under the 'subobjects-by-ptr' approach (this is the main reason that languages which enforce reference-semantics have 'inherent' performance problems).